home *** CD-ROM | disk | FTP | other *** search
- Large Disk mini-HOWTO
- Andries Brouwer, aeb@cwi.nl
- v1.0, 960626
-
- All about disk geometry and the 1024 cylinder limit for disks.
-
- 1. The problem
-
- Suppose you have a disk with more than 1024 cylinders. Suppose
- moreover that you have an operating system that uses the BIOS. Then
- you have a problem, because the usual INT13 BIOS interface to disk I/O
- uses a 10-bit field for the cylinder on which the I/O is done, so that
- cylinders 1024 and past are inaccessible.
-
- Fortunately, Linux does not use the BIOS, so there is no problem.
-
- Well, except for two things:
-
- (1) When you boot your system, Linux isn't running yet and cannot save
- you from BIOS problems. This has some consequences for LILO and
- similar boot loaders.
-
- (2) It is necessary for all operating systems that use one disk to
- agree on where the partitions are. In other words, if you use both
- Linux and, say, DOS on one disk, then both must interpret the
- partition table in the same way. This has some consequences for the
- Linux kernel and for fdisk.
-
- Below a rather detailed description of all relevant details. Note
- that I used kernel version 2.0.8 source as a reference. Other
- versions may differ a bit.
-
- 2. Booting
-
- When the system is booted, the BIOS reads sector 0 (known as the MBR -
- the Master Boot Record) from the first disk (or from floppy), and
- jumps to the code found there - usually some bootstrap loader. These
- small bootstrap programs found there typically have no own disk
- drivers and use BIOS services. This means that a Linux kernel can
- only be booted when it is entirely located within the first 1024
- cylinders.
-
- This problem is very easily solved: make sure that the kernel (and
- perhaps other files used during bootup, such as LILO map files) are
- located on a partition that is entirely contained in the first 1024
- cylinders of a disk that the BIOS can access - probably this means the
- first or second disk.
-
- Another point is that the boot loader and the BIOS must agree as to
- the disk geometry. It may help to give LILO the `linear' option.
- More details below.
-
- 3. Disk geometry and partitions
-
- If you have several operating systems on your disks, then each uses
- one or more disk partitions. A disagreement on where these partitions
- are may have catastrophic consequences.
-
- The MBR contains a partition table describing where the (primary)
- partitions are. There are 4 table entries, for 4 primary partitions,
- and each looks like
-
- struct partition {
- char active; /* 0x80: bootable, 0: not bootable */
- char begin[3]; /* CHS for first sector */
- char type;
- char end[3]; /* CHS for last sector */
- int start; /* 32 bit sector number (counting from 0) */
- int length; /* 32 bit number of sectors */
- };
-
- (where CHS stands for Cylinder/Head/Sector).
-
- Thus, this information is redundant: the location of a partition is
- given both by the 24-bit begin and end fields, and by the 32-bit start
- and length fields.
-
- Linux only uses the start and length fields, and can therefore handle
- partitions of not more than 2^32 sectors, that is, partitions of at
- most 2 TB. That is two hundred times larger than the disks available
- today, so maybe it will be enough for the next ten years or so.
-
- Unfortunately, the BIOS INT13 call uses CHS coded in three bytes, with
- 10 bits for the cylinder number, 8 bits for the head number, and 6
- bits for the track sector number. Possible cylinder numbers are
- 0-1023, possible head numbers are 0-255, and possible track sector
- numbers are 1-63 (yes, sectors on a track are counted from 1, not 0).
- With these 24 bits one can address 8455716864 bytes (7.875 GB), two
- hundred times larger than the disks available in 1983.
-
- Even more unfortunately, the standard IDE interface allows 256
- sectors/track, 65536 cylinders and 16 heads. This in itself allows
- access to 2^37 = 137438953472 bytes (128 GB), but combined with the
- BIOS restriction to 63 sectors and 1024 cylinders only 528482304 bytes
- (504 MB) remain addressable.
-
- This is not enough for present-day disks, and people resort to all
- kinds of trickery, both in hardware and in software.
-
- 4. Translation and Disk Managers
-
- Nobody is interested in what the `real' geometry of a disk is.
- Indeed, the number of sectors per track often is variable - there are
- more sectors per track close to the outer rim of the disk - so there
- is no `real' number of sectors per track. For the user it is best to
- regard a disk as just a linear array of sectors numbered 0, 1, ...,
- and leave it to the controller to find out where a given sector lives
- on the disk.
-
- This linear numbering is known as LBA. The linear address belonging
- to (c,h,s) for a disk with geometry (C,H,S) is c*H*S + h*S + (s-1).
- All SCSI controllers speak LBA, and some IDE controllers do.
-
- If the BIOS converts the 24-bit (c,h,s) to LBA and feeds that to a
- controller that understands LBA, then again 7.875 GB is addressable.
- Not enough for all disks, but still an improvement. Note that here
- CHS, as used by the BIOS, no longer has any relation to `reality'.
-
- Something similar works when the controller doesn't speak LBA but the
- BIOS knows about translation. (In the setup this is often indicated
- as `Large'.) Now the BIOS will present a geometry (C',H',S') to the
- operating system, and use (C,H,S) while talking to the disk
- controller. Usually S = S', C' = C/N and H' = H*N, where N is the
- smallest power of two that will ensure C' <= 1024 (so that least
- capacity is wasted by the rounding down in C' = C/N). Again, this
- allows access of up to 7.875 GB.
-
- If a BIOS does not know about `Large' or `LBA', then there are
- software solutions around. Disk Managers like OnTrack or EZ-Drive
- replace the BIOS disk handling routines by their own. Often this is
- accomplished by having the disk manager code live in the MBR and
- subsequent sectors (OnTrack calls this code DDO: Dynamic Drive
- Overlay), so that it is booted before any other operating system.
- That is why one may have problems when booting from a floppy when a
- Disk Manager has been installed.
-
- The effect is more or less the same as with a translating BIOS - but
- especially when running several different operating systems on the
- same disk, disk managers can cause a lot of trouble.
-
- Linux does support OnTrack Disk Manager since version 1.3.14, and EZ-
- Drive since version 1.3.29. Some more details are given below.
-
- 5. Kernel disk translation for IDE disks
-
- If the Linux kernel detects the presence of some disk manager on an
- IDE disk, it will try to remap the disk in the same way this disk
- manager would have done, so that Linux sees the same disk partitioning
- as for example DOS with OnTrack or EZ-Drive. However, NO remapping is
- done when a geometry was specified on the command line - so a
- `hd=cyls,heads,secs' command line option might well kill compatibility
- with a disk manager.
-
- The remapping is done by trying 4, 8, 16, 32, 64, 128, 255 heads
- (keeping H*C constant) until either C <= 1024 or H = 255.
-
- The details are as follows - subsection headers are the strings
- appearing in the corresponding boot messages. Here and everywhere
- else in this text partition types are given in hexadecimal.
-
- 5.1. EZD
-
- EZ-Drive is detected by the fact that the first primary partition has
- type 55. The geometry is remapped as described above, and the
- partition table from sector 0 is discarded - instead the partition
- table is read from sector 1. Disk block numbers are not changed, but
- writes to sector 0 are redirected to sector 1. This behaviour can be
- changed by recompiling the kernel with
- #define FAKE_FDISK_FOR_EZDRIVE 0 in ide.c.
-
- 5.2. DM6:DDO
-
- OnTrack DiskManager (on the first disk) is detected by the fact that
- the first primary partition has type 54. The geometry is remapped as
- described above and the entire disk is shifted by 63 sectors (so that
- the old sector 63 becomes sector 0). Afterwards a new MBR (with
- partition table) is read from the new sector 0. Of course this shift
- is to make room for the DDO - that is why there is no shift on other
- disks.
-
- 5.3. DM6:AUX
-
- OnTrack DiskManager (on other disks) is detected by the fact that the
- first primary partition has type 51 or 53. The geometry is remapped
- as described above.
-
- 5.4. DM6:MBR
-
- An older version of OnTrack DiskManager is detected not by partition
- type, but by signature. (Test whether the offset found in bytes 2 and
- 3 of the MBR is not more than 430, and the short found at this offset
- equals 0x55AA, and is followed by an odd byte.) Again the geometry is
- remapped as above.
-
- 5.5. PTBL
-
- Finally, there is a test that tries to deduce a translation from the
- start and end values of the primary partitions: If some partition has
- start and end cylinder less than 256, and start and end sector number
- 1 and 63, respectively, and end heads 31, 63 or 127, then, since it is
- customary to end partitions on a cylinder boundary, and since moreover
- the IDE interface uses at most 16 heads, it is conjectured that a BIOS
- translation is active, and the geometry is remapped to use 32, 64 or
- 128 heads, respectively. (Maybe there is a flaw here, and genhd.c
- should not have tested the high order two bits of the cylinder
- number?) However, no remapping is done when the current idea of the
- geometry already has 63 sectors per track and at least as many heads
- (since this probably means that a remapping was done already).
-
- 6. Consequences
-
- What does all of this mean? For Linux users only one thing: that they
- must make sure that LILO and fdisk use the right geometry where
- `right' is defined for fdisk as the geometry used by the other
- operating systems on the same disk, and for LILO as the geometry that
- will enable successful interaction with the BIOS at boot time.
- (Usually these two coincide.)
-
- How does fdisk know about the geometry? It asks the kernel, using the
- HDIO_GETGEO ioctl. But the user can override the geometry
- interactively or on the command line.
-
- How does LILO know about the geometry? It asks the kernel, using the
- HDIO_GETGEO ioctl. But the user can override the geometry using the
- `disk=' option. One may also give the linear option to LILO, and it
- will store LBA addresses instead of CHS addresses in its map file, and
- find out of the geometry to use at boot time (by using INT 13 Function
- 8 to ask for the drive geometry).
-
- How does the kernel know what to answer? Well, first of all, the user
- may have specified an explicit geometry with a `hd=cyls,heads,secs'
- command line option. And otherwise the kernel will ask the hardware.
-
- 6.1. IDE details
-
- Let me elaborate. The IDE driver has four sources for information
- about the geometry. The first (G_user) is the one specified by the
- user on the command line. The second (G_bios) is the BIOS Fixed Disk
- Parameter Table (for first and second disk only) that is read on
- system startup, before the switch to 32-bit mode. The third (G_phys)
- and fourth (G_log) are returned by the IDE controller as a response to
- the IDENTIFY command - they are the `physical' and `current logical'
- geometries.
-
- On the other hand, the driver needs two values for the geometry: on
- the one hand G_fdisk, returned by a HDIO_GETGEO ioctl, and on the
- other hand G_used, which is actually used for doing I/O. Both G_fdisk
- and G_used are initialized to G_user if given, to G_bios when this
- information is present according to CMOS, and to to G_phys otherwise.
- If G_log looks reasonable then G_used is set to that. Otherwise, if
- G_used is unreasonable and G_phys looks reasonable then G_used is set
- to G_phys. Here `reasonable' means that the number of heads is in the
- range 1-16.
-
- To say this in other words: the command line overrides the BIOS, and
- will determine what fdisk sees, but if it specifies a translated
- geometry (with more than 16 heads), then for kernel I/O it will be
- overridden by output of the IDENTIFY command.
-
- 6.2. SCSI details
-
- The situation for SCSI is slightly different, as the SCSI commands
- already use logical block numbers, so a `geometry' is entirely
- irrelevant for actual I/O. However, the format of the partition table
- is still the same, so fdisk has to invent some geometry, and also uses
- HDIO_GETGEO here - indeed, fdisk does not distinguish between IDE and
- SCSI disks. As one can see from the detailed description below, the
- various drivers each invent a somewhat different geometry. Indeed,
- one big mess.
-
- If you are not using DOS or so, then avoid all extended translation
- settings, and just use 64 heads, 32 sectors per track (for a nice,
- convenient 1 MB per cylinder), if possible, so that no problems arise
- when you move the disk from one controller to another. Some SCSI disk
- drivers (aha152x, pas16, ppa, qlogicfas, qlogicisp) are so nervous
- about DOS compatibility that they will not allow a Linux-only system
- to use more than about 8 GB. This is a bug.
-
- What is the real geometry? The easiest answer is that there is no
- such thing. And if there were, you wouldn't want to know, and
- certainly NEVER, EVER tell fdisk or LILO or the kernel about it. It
- is strictly a business between the SCSI controller and the disk. Let
- me repeat that: only silly people tell fdisk/LILO/kernel about the
- true SCSI disk geometry.
-
- But if you are curious and insist, you might ask the disk itself.
- There is the important command READ CAPACITY that will give the total
- size of the disk, and there is the MODE SENSE command, that in the
- Rigid Disk Drive Geometry Page (page 04) gives the number of cylinders
- and heads (this is information that cannot be changed), and in the
- Format Page (page 03) gives the number of bytes per sector, and
- sectors per track. This latter number is typically dependent upon the
- notch, and the number of sectors per track varies - the outer tracks
- have more sectors than the inner tracks. The Linux program scsiinfo
- will give this information. There are many details and complications,
- and it is clear that nobody (probably not even the operating system)
- wants to use this information. Moreover, as long as we are only
- concerned about fdisk and LILO, one typically gets answers like
- C/H/S=4476/27/171 - values that cannot be used by fdisk because the
- partition table reserves only 10 resp. 8 resp. 6 bits for C/H/S.
-
- Then where does the kernel HDIO_GETGEO get its information from?
- Well, either from the SCSI controller, or by making an educated guess.
- Some drivers seem to think that we want to know `reality', but of
- course we only want to know what the DOS or OS/2 FDISK (or Adaptec
- AFDISK, etc) will use.
-
- Note that Linux fdisk needs the numbers H and S of heads and sectors
- per track to convert LBA sector numbers into c/h/s addresses, but the
- number C of cylinders does not play a role in this conversion. Some
- drivers use (C,H,S) = (1023,255,63) to signal that the drive capacity
- is at least 1023*255*63 sectors. This is unfortunate, since it does
- not reveal the actual size, and will limit the users of most fdisk
- versions to about 8 GB of their disks - a real limitation in these
- days.
-
- In the description below, M denotes the total disk capacity, and C, H,
- S the number of cylinders, heads and sectors per track. It suffices
- to give H, S if we regard C as defined by M / (H*S).
-
- By default, H=64, S=32.
-
- aha1740, dtc, g_NCR5380, t128, wd7000:
- H=64, S=32.
-
- aha152x, pas16, ppa, qlogicfas, qlogicisp:
- H=64, S=32 unless C > 1024, in which case H=255, S=63, C =
- min(1023, M/(H*S)). (Thus C is truncated, and H*S*C is not an
- approximation to the disk capacity M. This will confuse most
- versions of fdisk.) The ppa.c code uses M+1 instead of M and
- says that due to a bug in sd.c M is off by 1.
-
- advansys:
- H=64, S=32 unless C > 1024 and moreover the `> 1 GB' option in
- the BIOS is enabled, in which case H=255, S=63.
-
- aha1542:
- Ask the controller which of two possible translation schemes is
- in use, and use either H=255, S=63 or H=64, S=32. In the former
- case there is a boot message "aha1542.c: Using extended bios
- translation".
-
- aic7xxx:
- H=64, S=32 unless C > 1024, and moreover either the "extended"
- boot parameter was given, or the `extended' bit was set in the
- SEEPROM or BIOS, in which case H=255, S=63.
-
- buslogic:
- H=64, S=32 unless C >= 1024, and moreover extended translation
- was enabled on the controller, in which case if M < 2^22 then
- H=128, S=32; otherwise H=255, S=63. However, after making this
- choice for (C,H,S), the partition table is read, and if for one
- of the three possibilities (H,S) = (64,32), (128,32), (255,63)
- the value endH=H-1 is seen somewhere then that pair (H,S) is
- used, and a boot message is printed "Adopting Geometry from
- Partition Table".
-
- fdomain:
- Find the geometry information in the BIOS Drive Parameter Table,
- or read the partition table and use H=endH+1, S=endS for the
- first partition, provided it is nonempty, or use H=64, S=32 for
- M < 2^21 (1 GB), H=128, S=63 for M < 63*2^17 (3.9 GB) and H=255,
- S=63 otherwise.
-
- in2000:
- Use the first of (H,S) = (64,32), (64,63), (128,63), (255,63)
- that will make C <= 1024. In the last case, truncate C at 1023.
-
- seagate:
- Read C,H,S from the disk. (Horrors!) If C or S is too large,
- then put S=17, H=2 and double H until C <= 1024. This means
- that H will be set to 0 if M > 128*1024*17 (1.1 GB). This is a
- bug.
-
- ultrastor and u14_34f:
- One of three mappings ((H,S) = (16,63), (64,32), (64,63)) is
- used depending on the controller mapping mode.
-
- If the driver does not specify the geometry, we fall back on an edu¡
- cated guess using the partition table, or using the total disk capac¡
- ity.
-
- Look at the partition table. Since by convention partitions end on a
- cylinder boundary, we can, given end = (endC,endH,endS) for any
- partition, just put H = endH+1 and S = endS. (Recall that sectors are
- counted from 1.) More precisely, the following is done. If there is
- a nonempty partition, pick the partition with the largest beginC. For
- that partition, look at end+1, computed both by adding start and
- length and by assuming that this partition ends on a cylinder
- boundary. If both values agree, or if endC = 1023 and start+length is
- an integral multiple of (endH+1)*endS, then assume that this partition
- really was aligned on a cylinder boundary, and put H = endH+1 and S =
- endS. If this fails, either because there are no partitions, or
- because they have strange sizes, then look only at the disk capacity
- M. Algorithm: put H = M/(62*1024) (rounded up), S = M/(1024*H)
- (rounded up), C = M/(H*S) (rounded down). This has the effect of
- producing a (C,H,S) with C at most 1024 and S at most 62.
-
-